66 research outputs found

    Caipirini: using gene sets to rank literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Keeping up-to-date with bioscience literature is becoming increasingly challenging. Several recent methods help meet this challenge by allowing literature search to be launched based on lists of abstracts that the user judges to be 'interesting'. Some methods go further by allowing the user to provide a second input set of 'uninteresting' abstracts; these two input sets are then used to search and rank literature by relevance. In this work we present the service 'Caipirini' (<url>http://caipirini.org</url>) that also allows two input sets, but takes the novel approach of allowing ranking of literature based on one or more sets of genes.</p> <p>Results</p> <p>To evaluate the usefulness of Caipirini, we used two test cases, one related to the human cell cycle, and a second related to disease defense mechanisms in <it>Arabidopsis thaliana</it>. In both cases, the new method achieved high precision in finding literature related to the biological mechanisms underlying the input data sets.</p> <p>Conclusions</p> <p>To our knowledge Caipirini is the first service enabling literature search directly based on biological relevance to gene sets; thus, Caipirini gives the research community a new way to unlock hidden knowledge from gene sets derived via high-throughput experiments.</p

    Combination therapy in hypertension: An update

    Get PDF
    Meticulous control of blood pressure is required in patients with hypertension to produce the maximum reduction in clinical cardiovascular end points, especially in patients with comorbidities like diabetes mellitus where more aggressive blood pressure lowering might be beneficial. Recent clinical trials suggest that the approach of using monotherapy for the control of hypertension is not likely to be successful in most patients. Combination therapy may be theoretically favored by the fact that multiple factors contribute to hypertension, and achieving control of blood pressure with single agent acting through one particular mechanism may not be possible. Regimens can either be fixed dose combinations or drugs added sequentially one after other. Combining the drugs makes them available in a convenient dosing format, lower the dose of individual component, thus, reducing the side effects and improving compliance. Classes of antihypertensive agents which have been commonly used are angiotensin receptor blockers, thiazide diuretics, beta and alpha blockers, calcium antagonists and angiotensin-converting enzyme inhibitors. Thiazide diuretics and calcium channel blockers are effective, as well as combinations that include renin-angiotensin-aldosterone system blockers, in reducing BP. The majority of currently available fixed-dose combinations are diuretic-based. Combinations may be individualized according to the presence of comorbidities like diabetes mellitus, chronic renal failure, heart failure, thyroid disorders and for special population groups like elderly and pregnant females

    Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    Get PDF
    We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts

    The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    Get PDF
    BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.RESULTS:A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89 and the best AUC iP/R was 68. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35) the macro-averaged precision ranged between 50 and 80, with a maximum F-Score of 55. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows

    A new stress sensor and risk factor for suicide: The T allele of the functional genetic variant in the GABRA6 gene

    Get PDF
    © 2017 The Author(s). Low GABA transmission has been reported in suicide, and GABRA6 rs3219151 T allele has been associated with greater physiological and endocrine stress response in previous studies. Although environmental stress also plays a role in suicide, the possible role of this allele has not been investigated in this respect. In our present study effect of rs3219151 of GABRA6 gene in interaction with recent negative life events on lifetime and current depression, current anxiety, as well as lifetime suicide were investigated using regression models in a white European general sample of 2283 subjects. Post hoc measures for phenotypes related to suicide risk were also tested for association with rs3219151 in interaction with environmental stress. No main effect of the GABRA6 rs3219151 was detected, but in those exposed to recent negative life events GABRA6 T allele increased current anxiety and depression as well as specific elements of suicide risk including suicidal and death-related thoughts, hopelessness, restlessness and agitation, insomnia and impulsiveness as measured by the STOP task. Our data indicate that stress-associated suicide risk is elevated in carriers of the GABRA6 rs3219151 T allele with several independent markers and predictors of suicidal behaviours converging to this increased risk

    Novel genetic associations for blood pressure identified via gene-alcohol interaction in up to 570K individuals across multiple ancestries

    Get PDF
    Heavy alcohol consumption is an established risk factor for hypertension; the mechanism by which alcohol consumption impact blood pressure (BP) regulation remains unknown. We hypothesized that a genome-wide association study accounting for gene-alcohol consumption interaction for BP might identify additional BP loci and contribute to the understanding of alcohol-related BP regulation. We conducted a large two-stage investigation incorporating joint testing of main genetic effects and single nucleotide variant (SNV)-alcohol consumption interactions. In Stage 1, genome-wide discovery meta-analyses in approximate to 131 K individuals across several ancestry groups yielded 3,514 SNVs (245 loci) with suggestive evidence of association (P <1.0 x 10(-5)). In Stage 2, these SNVs were tested for independent external replication in individuals across multiple ancestries. We identified and replicated (at Bonferroni correction threshold) five novel BP loci (380 SNVs in 21 genes) and 49 previously reported BP loci (2,159 SNVs in 109 genes) in European ancestry, and in multi-ancestry meta-analyses (P < 5.0 x 10(-8)). For African ancestry samples, we detected 18 potentially novel BP loci (P< 5.0 x 10(-8)) in Stage 1 that warrant further replication. Additionally, correlated meta-analysis identified eight novel BP loci (11 genes). Several genes in these loci (e.g., PINX1, GATA4, BLK, FTO and GABBR2 have been previously reported to be associated with alcohol consumption. These findings provide insights into the role of alcohol consumption in the genetic architecture of hypertension
    corecore